Home Exchange Presentation¶

https://www.homeexchange.com/

HomeExchange is a new way to travel.

It is a major alternative to Airbnb where you can swap houses. Registration is free, and you only pay the annual membership fee of €160 when you find your first exchange.

Subscribing to the annual membership also gives you access to Guest Points (500GP for the first subscription and 250GP for subsequent ones).

Guest Points were created to enable non-reciprocal exchanges when members cannot find someone to host them on the dates they would like to travel.

Thanks to this system, users are able to organize two types of exchanges on HomeExchange:

  • Reciprocal: User1 hosts the person (User2) whose home User1 is staying in (both User1 and User2 are both the host and the guest).
  • Non-reciprocal: One member of the exchange is not interested in going to the other's home, so they offer a predetermined amount of Guest Points per night for their hospitality. The user can be either the host or the guest. For example, imagine that a German couple wants to come to your home, but you wish to travel to Italy, not Germany. Without Guest Points, a simple reciprocal exchange would not be possible in that case. However, with Guest Points, the German couple can come stay in your home in exchange for Guest Points. You can then use these Guest Points to stay in a home in Italy. Guest Points exist to compensate for the hospitality of a host.

Problem presentation¶

HomeExchange needs to understand why some of their customers leave.

Several variables are available to identify whether seniority, activity as a guest or host, sponsorship, and subscriptions taken through promotions have an impact on the churn rate.

Data description¶

Two datasets are provided by HomeExchange :

  • Exchanges:
    • One line by exchanges initiated or received by users who have register (but not necessary subscribed).
    • If exchanges is reciprocal, only one conversation is created (via the creator_id), but two exchanges are created.
    • Data :
      • conversation_id : id of the conversation (not unique)
      • exchange_id : exchange id which is unique, even in case of reciprocal exchange (in this case conversation_id is the same for both exchange)
      • created_at :date of conversation creation
      • creator_id :user_id of the user that who initiated the conversation
      • guest_user_id : user_id of the (potential) guest
      • host_user_id : user_id of the (potential) host N
      • finalized_at : date when the exchange was finalized (if null, it means that the exchange request was not successful)
      • canceled_at: date when the finalized exchange has been cancelled
      • start_on: date when the exchange starts (can be filled in even if the exchange is not finalized. It corresponds to the requested dates)
      • end_on: date when the exchange ends (can be filled in even if the exchange is not finalized. It corresponds to the requested dates)
      • guest_count : number of people coming
      • night_count: number of nights of the exchange
      • user_cancellation_id: user_id who canceled
      • exchange_type: via Guest Points or reciprocal
      • home_type: house or appartment
      • residence_type: main or secondary house
      • capacity: maximum number of people who can come
      • country: host house country
      • region: host house region
      • department: host house department
      • city: host house city
  • Subscription:
    • One line per subscription (for subscription > 2019).
    • /!\ subscription in 2019 can be a renew of a first subscription older than 2019
    • If a user has subscribed several years, there are as many lines as subscriptions.
    • If renew = 1 , you will find the subscription line and the renewal line for the next year
    • Data :
      • subscription_date : date of subscription of the user (can be a renew (not the first subscription)) N
      • user_id : id of the user
      • renew : did the user re-subscribe the following year (the month of the expiration of his subscription)
      • first_subscription_date : date of the first subscription (can be before 2019)
      • first_subscription : 1 if it's the first subscription of the user
      • referral : 1 if the user has been sponsored
      • promotion : 1 if the user had a promotion for his/her subscription
      • payment3x : 1 if the user has used the 3x payment to subscribe
      • payment2 : 1 if the user hashas paid his/her 2nd payment
      • payment3 : 1 if the user hashas paid his/her 3nd payment
      • country : user country, if null, it might be a renew and the info is not saved here. Info will be retrieved in exchanges data file
      • region : user region, if null, it might be a renew and the info is not saved here. Info will be retrieved in exchanges data file
      • department : city : user city, if null, it might be a renew and the info is not saved here. Info will be retrieved in exchanges data file

Datasets cover period from 2019-01-01 to 2022-11-30. As subscription are for one year, the maximal date subscription date is '2021-10-31'.

Some cleaning and data enrichment was done in SQL (DBT) :

  • Cleaning :
    • Exchange data :
      • Remove duplicated row
      • Remove line were host and guest was the same user
    • Subscription data :
      • Remove duplicated row
      • Remove user were there were more than 3 subscriptions in 3 years (not normal since subscription are annual)
      • Add missing country, city and region info based on Exchange data (where user was host)
  • Enrichment :
    • Subscription data, add the following information :
      • is_zombie : 1 if the user subscribe again after 1 year of absence (due to covid for example)
      • previous_inscription_date : date of the previous subscription
      • inscription_diff_year : difference in year between subscription_date and previous_subscription_date
      • nb_of_inscription_by_users : nb of subscription by users between 2019 and 2021. Can only be <=3 (selection done on this key for removing strange behavior with more than 1 subscription per year)

A new table was created in SQL BigQuery

  • User:
    • In this table, information such as the number of subscription of the user, the number of time the user did an exchange etc.. are summarized by users (one line per user)
    • It is an join between Subscription and Exchange table
    • Data :
      • user_id : user_id, user can be a subscriber or not (in this case first subscription date is null)
      • first_subscription_date : date of the first subscription (is any) of the user
      • last_subscription_date : date of the last subscription (is any between 2019 and 2021) of the user
      • nb_of_inscription : number of subscription of the user between 2019 and 2021 (can be more if user subscribe before 2019)
      • mean_churn_rate : for subscription between 2019 and 2021, give the average churn rate for the user (for example, 3 subscription but only 2 renew, churn rate = 1 - 2/3 = 0.33)
      • referall : 1 if the user were at least referall once, 0 if the user was never refer
      • promotion : 1 if the user used at least one promotion, 0 if the user never used promotion
      • country : country of the user
      • nb_of_conversation_as_host : count the total number of conversation created were the user is host
      • nb_of_conversation_as_host_notFinalized : count the total number of conversation that DID not become an exchange, were the user is host
      • nb_of_exchange_as_host_done : count the total number of conversation that DID become an exchange and WAS NOT CANCEL, were the user is host
      • first_date_exchange_as_host_done : finalized (not date of travel) date of the first exchange of the user as host (between 2019 and 2022, for older user it can be anterior to this)
      • nb_of_exchange_as_host_canceled : count the total number of conversation that DID become an exchange and WAS CANCEL, were the user is host
      • nb_of_conversation_as_guest : count the total number of conversation created were the user is guest
      • nb_of_conversation_as_guest_notFinalized : count the total number of conversation that DID not become an exchange, were the user is guest
      • nb_of_exchange_as_guest_done : count the total number of conversation that DID become an exchange and WAS NOT CANCEL, were the user is guest
      • first_date_exchange_as_host_done : finalized (not date of travel) date of the first exchange of the user as guest (between 2019 and 2022, for older user it can be anterior to this)
      • nb_of_exchange_as_guest_canceled : count the total number of conversation that DID become an exchange and WAS CANCEL, were the user is guest
      • nb_of_conversation_for_exchange : count the total number of conversation created for reciprocal exchange (user is both guest and host)
      • nb_of_conversation_for_exchange_notFinalized : count the total number of conversation that DID NOT become an exchange (user is both guest and host)
      • nb_of_exchange_as_exchange_done : count the total number of conversation that DID become an exchange and WAS NOT cancel for reciprocal exchange (user is both guest and host)
      • first_date_exchange_as_host_done : finalized (not date of travel) date of the first exchange of the user as both guest and host (between 2019 and 2022, for older user it can be anterior to this)
      • nb_of_exchange_as_exchange_canceled : count the total number of conversation that DID become an exchange and WAS cancel for reciprocal exchange (user is both guest and host)
      • total_exchange_done : total of exchange done by one users during the period (2019 - 2022)

Load data¶

Subscription data :

loading subscription dataset....
.... cleaning stage in SQL remove :  4.37 % of data

Exchange data :

loading exchange data
Resample exchange data (monthly)

Users data :

Load Users data
414678

Subscription Analysis¶

We will begin by examining the subscription table, which contains information for each user and their subscriptions. Each subscription is represented by a separate line in the table, and there may be multiple lines for a single user if they have multiple subscriptions.

Analyzing this table will enable us to conduct an initial investigation into the reasons why users may leave after subscribing. We can examine patterns or factors that may contribute to user churn and gain insights into which users are more likely to cancel their subscriptions.

In the following section, we will study the profile of churners using the Users table. By analyzing this table, we can delve deeper into the characteristics and behaviors of users who churn, helping us understand why they cancel their subscriptions.

Churn Rate¶

As renew is indicated in the Subscription table as 1 if the user re-subscribe and 0 if not.

Therefore, the churn rate can be determined as :

CR = 1 - mean(renew)

which is equivalent to calculate the number of time a user that did not re-subscribe divided by the total number of subscribtions.

Total churn rate¶

A total of 33% of churn is experience for the period 2019 - 2022, for all users.

Churn rate by country¶

The churn rate by continent is shown below, as well as the proportion of users that the country represents.

  • South America has the highest churn rate, but with a very low number of users.
  • Europe and North America represent the largest proportion of users.
  • European have a highest churn rate compare to North America. This is due to the fact that North American may tend to take out a subscription before finding an exchange and/or renew without even having taken advantage of their subscription :
    • I suspect that North American does have automatic renew while European have not.
    • This can be verified by looking at the churn rate for USA's users vs. France's users that did 0 exchanges during the all 2019-2022 period:

The churn rate for users that did 0 exchange between 2019 and 2022 is much smaller for USA's users compare to French's users, indicating that USA users have more tendency to use automatic renew.

By looking at users that subscribed for the first time after 2019 and that at least 1 exchange : 
-  69.4 % of USA users subscribed BEFORE doing any exchange
-  61.8 % of French users subscribed BEFORE doing any exchange

We can conclude that Americans may tend to take out a subscription before finding an exchange and/or renew without even having taken advantage of their subscription.

Subscription analysis as function of time¶

Subscriptions are dependant on time (more subscription for example during official holidays like summer holidays for example).

Besides, data are available for 2019 to 2022, during which COVID and its lockdown strike the world. During this period, as expected, number of subscription decrease and churn rate increase.

/!\ Do not forget than there is one year between the choice of the user to renew or not and the subscription date (that's why the churn rate is shifted by one year compared to subscription count)

Some of the user that leaves during COVID can back after, shown as "return" rate in the following chart.

COVID is showing a large impact on churn rate, which can be verified with a statistical test (Fisher exact in this case).

COVID is for the period before 2021-04-15 and POST-COVID is for period after this date

renew 0 1
covid_renew
0_covid 17595 27334
1_post-covid 14015 36686
p_value: 1.23418038102e-312
Result: There is a significant difference between COVID and POST-COVID period

This impact will be keep in mind for the rest of the analysis.

Churn rate as a function of user caracteristics¶

The churn rate as a function of the user caracteristics is important to understand why users are leaving

Is it the first subsription of the user ?¶

In this case, COVID can have a real impact

As expected, the churn rate is smaller for user that did subscribe before (not a first subscription). Besides, if the first subscription was done before or during COVID, then the churn rate is larger than if it was done after COVID (after 2021-06-15), which is perfectly normal since user had more difficulty to travel

As expected, the more the users subscribed in the past, the less they leave during the following year. That means that, when the solution is working (managed to travel), user stay and continue to use it.

Use of promotion¶

When a user uses a promotion for the first subscription, they have to more change to stay than if they did not use a promotion. When a user uses a promotion, but it is not their first subscription, then the behavior is reversed and the churn rate increase in case of use of promotion from 28% to 41%

**HUGE !!! HOW TO EXPLAIN THIS ???????**

Referral¶

Referral do not impact much the churn rate

Exchange Table¶

In the previous section, we have studied subscription and users' characteristics. One other interesting feature to study is the exchange type and number as a function of time.

As expected, during look down, a lot of exchange were canceled as users could not travel. In the same way, the number of conversation and exchange finalizes during COVID period drastically drops.

However, something interesting appears on this graph: the amount of exchange increase after COVID (compare to the period before the first look down). Since the number of inscriptions (see section 3) did not increase that much after COVID, that indicate that the user that already subscribed travel more after COVID! There is alsmot 3 ( 2.97) times more exchange finalized in June 2022 compared to June 2019, while there are only 1.48 more users that subscribed (including renewals).

(Exchanges : JUNE 2021: 375 865 vs JUNE 2019: 126 410) Subscriptions : JUNE 2021 : 3 262 vs JUNE 2019 : 2 203)

**People travel more, that means that those users should be happier with the solution and could be the best ambassador for Home Exchange. It could be interesting to increase reward for users that recommended the solution to friends.**.

Users Analysis¶

Based on both Subcription and Exchange tables, we want to study the comportment of users that subscribe to the solution and but not leave.

First, it is interesting to see that the number of users that registered but not subscribed is important.

Then, the impact of the number of exchanges and the typology of the users (if they are a guest or host, or both) on the churn rate will be studied.

NB: each user can leave or not several years, which give a mean churn rate by user. To calculate the "total" mean churn rate (calculated by the number of subscriptions and not by number of users), the mean churn rate will be calculated as a weighted average of the mean churn rate per user (weighted by the number of inscriptions of the user).

Register users that never subscribed to the solution¶

Almost 84% of registered users never subscribed to the solution.

It is an important number but not really surprising considering the solution offered by HomeExchange.

By looking at the number of conversations that users that never subscribed did, we can see that the large majority of register users try less than 15 conversations.

The number of conversations follows a decreasing exponential function. It is possible to estimate the number of conversations after which half of the users "disappear" (meaning that half of the initial users are not trying to find exchange after this number).

To do this, we can do a fit with a half-life functional:

Number of conversation for which the total number of users (that did not subscribed) reduce to half of its initial value is  8.9

After 9 conversations, half of the users are not trying to find exchange anymore.

It could be interesting to study in more depth why users to not subscribed, but it will not be done here as we are interested by the churn rate.

Subscribed user¶

  • 78% of users that susbcribed between 2019 and 2021 did at least one exchange, and 46% of those users did more than 5 exchanges between 2019 and 2022.

  • 22 % of users that subscribed between 2019 and 2021 did not use the solution at all. As expected, those users have a large mean churn rate equals to 51.4% as shown on the next chart :

The use of Guest Point (guest only) does not allow to convince users to stay another year especially for first subscriber (even after COVID).

On the other hand, if users do exchange their houses, even for a first subscription, they have a much smaller churn rate!

This is the solution that is working !

The number of conversations follows a decreasing exponential function. It is possible to estimate the number of conversations after which half of the users "disappear" (meaning that half of the initial users are not trying to find exchange after this number).

To do this, we can do a fit with a half-life function:

Number of conversation for which the total number of users (that did not exchange) reduce to half of its initial value : 12.5

Users that never exchanges and did less than 12 conversations have a largest churn rate than those who did more than 12 conversations. That behavior indicated that the churner are those who do not try a lot to find an exchange.

**Maybye HomeExchange could try to motivate them with some e-mail or recommendation (for example improve quality of house picture or description, alert when users city is on the top research on the web site etc...)**